Members
Overall Objectives
Research Program
Application Domains
New Software and Platforms
New Results
Bilateral Contracts and Grants with Industry
Partnerships and Cooperations
Dissemination
Bibliography
XML PDF e-pub
PDF e-Pub


Section: New Results

Learning and statistical models

Fast and Robust Archetypal Analysis for Representation Learning

Participants : Yuansi Chen, Julien Mairal, Zaid Harchaoui.

In [9] , we revisit a pioneer unsupervised learning technique called archetypal analysis, which is related to successful data analysis methods such as sparse coding and non-negative matrix factorization. Since it was proposed, archetypal analysis did not gain a lot of popularity even though it produces more interpretable models than other alternatives. Because no efficient implementation has ever been made publicly available, its application to important scientific problems may have been severely limited. Our goal is to bring back into favour archetypal analysis. We propose a fast optimization scheme using an active-set strategy, and provide an efficient open-source implementation interfaced with Matlab, R, and Python. Then, we demonstrate the usefulness of archetypal analysis for computer vision tasks, such as codebook learning, signal classification, and large image collection visualization.

In Figure 5 , we present some archetypes corresponding to the request “Paris” when downloading 36 600 images uploaded in 2012 and 2013, and sorted by relevance on the Flickr website.

Figure 5. Classical landmarks appear on the left, which is not surprising since Flickr contains a large number of vacation pictures. In the middle, we display several archetypes that we did not expect, including ones about soccer, graffitis, food, flowers, and social gatherings. Finally, we display on the right some archetypes that do not seem to have some semantic meaning, but they capture some scene composition or texture that are common in the dataset.
IMG/chen1.png

Conditional Gradient Algorithms for Norm-Regularized Smooth Convex Optimization

Participants : Zaid Harchaoui, Anatoli Juditsky, Arkadii Nemirovski.

In this paper [6] , we consider convex optimization problems arising in machine learning in high-dimensional settings. For several important learning problems, such as e.g. noisy matrix completion, state-of-the-art optimization approaches such as composite minimization algorithms are difficult to apply and do not scale up to large datasets. We study three conditional gradient-type algorithms, i.e. first-order optimization algorithms that require a linear minimization oracle but do not require a proximal oracle. These new algorithms are suitable for large-scale problems, and enjoy finite-time convergence guarantees. Promising experimental results are presented on two large-scale real-world datasets. The method is illustrated in Figure 6 .

Figure 6. Overview of the composite conditional gradient algorithm which minimizes F(x):=f(x)+λx𝒜, where f is smooth and ·𝒜 is an atomic-decomposition norm.
IMG/ccg.png

A Smoothing Approach for Composite Conditional Gradient with Nonsmooth Loss

Participants : Federico Pierucci, Zaid Harchaoui, Jérôme Malick [BIPOP Team, Inria] .

In [25] , we consider learning problems where the nonsmoothness lies both in the convex empirical risk and in the regularization penalty. Examples of such problems include learning with nonsmooth loss functions and atomic decomposition regularization penalty. Such doubly nonsmooth learning problems prevent the use of recently proposed composite conditional gradient algorithms for training, which are particularly attractive for large-scale applications. Indeed, they rely on the assumption that the empirical risk part of the objective is smooth. We propose a composite conditional gradient algorithm with smoothing to tackle such learning problems. We set up a framework allowing to systematically design parametrized smooth surrogates of nonsmooth loss functions. We then propose a smoothed composite conditional gradient algorithm, for which we prove theoretical guarantees on the accuracy. We present promising experimental results on collaborative filtering tasks (see Figure 7 ).

Figure 7. Illustration of the smooth surrogate with parameter μ (green) of the absolute value function (black).
IMG/nonsmooth_ccg.png

Incremental Majorization-Minimization Optimization with Application to Large-Scale Machine Learning

Participant : Julien Mairal.

In this paper [27] , we study optimization methods consisting of iteratively minimizing surrogates of an objective function, as illustrated in Figure 8 . We introduce a new incremental scheme that experimentally matches or outperforms state-of-the-art solvers for large-scale optimization problems typically arising in machine learning.

Figure 8. Illustration of the basic majorization-minimization principle. We compute a surrogate gn of the objective function f around a current estimate θn-1. The new estimate θn is a minimizer of gn. The approximation error hn is smooth.
IMG/julien3.png

Efficient RNA Isoform Identification and Quantification from RNA-Seq Data with Network Flows

Participants : Elsa Bernard [Institut Curie, Ecoles des Mines-ParisTech] , Laurent Jacob [CNRS, LBBE Laboratory] , Julien Mairal [correspondant] , Jean-Philippe Vert [Institut Curie, Ecoles des Mines-ParisTech] .

Several state-of-the-art methods for isoform identification and quantification are based on 1-regularized regression, such as the Lasso. However, explicitly listing the—possibly exponentially—large set of candidate transcripts is intractable for genes with many exons. For this reason, existing approaches using the 1-penalty are either restricted to genes with few exons or only run the regression algorithm on a small set of preselected isoforms. In [4] , we introduce a new technique called FlipFlop, which can efficiently tackle the sparse estimation problem on the full set of candidate isoforms by using network flow optimization. Our technique removes the need of a preselection step, leading to better isoform identification while keeping a low computational cost. Experiments with synthetic and real RNA-Seq data confirm that our approach is more accurate than alternative methods and one of the fastest available. Figure 9 presents the graph on which the network flow optimization is performed.

Figure 9. Graph on which we perform network flow optimization. Nodes represent observed reads, and paths on the graph correspond to isoforms.
IMG/elsa1.png

Riemannian Sparse Coding for Positive Definite Matrices

Participants : Anoop Cherian, Suvrit Sra [MPI] .

Inspired by the great success of sparse coding for vector valued data, our goal in this work [12] is to represent symmetric positive definite (SPD) data matrices as sparse linear combinations of atoms from a dictionary, where each atom itself is an SPD matrix. Since SPD matrices follow a non-Euclidean (in fact a Riemannian) geometry, existing sparse coding techniques for Euclidean data cannot be directly extended. Prior works have approached this problem by defining a sparse coding loss function using either extrinsic similarity measures (such as the log-Euclidean distance) or kernelized variants of statistical measures (such as the Stein divergence, Jeffrey's divergence, etc.). In contrast, we propose to use the intrinsic Riemannian distance on the manifold of SPD matrices. Our main contribution is a novel mathematical model for sparse coding of SPD matrices; we also present a computationally simple algorithm for optimizing our model. Experiments on several computer vision datasets showcase superior classification and retrieval performance compared against state-of-the-art approaches.